Multiagent Learning with a Noisy Global Reward Signal

نویسندگان

Scott Proper

Kagan Tumer

چکیده

Scaling multiagent reinforcement learning to domains with many agents is a complex problem. In particular, multiagent credit assignment becomes a key issue as the system size increases. Some multiagent systems suffer from a global reward signal that is very noisy or difficult to analyze. This makes deriving a learnable local reward signal very difficult. Difference rewards (a particular instance of reward shaping) have been used to alleviate this concern, but they remain difficult to compute in many domains. In this paper we present an approach to modeling the global reward using function approximation that allows the quick computation of local rewards. We demonstrate how this model can result in significant improvements in behavior for three congestion problems: a multiagent “bar problem”, a complex simulation of the United States airspace, and a generic air traffic domain. We show how the model of the global reward may be either learned onor off-line using either linear functions or neural networks. For the bar problem, we show an increase in reward of nearly 200% over learning using the global reward directly. For the air traffic problem, we show a decrease in costs of 25% over learning using the global reward directly.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

CLEANing the Reward: Counterfactual Actions Remove Exploratory Action Noise in Multiagent Learning

Coordinating the joint-actions of agents in cooperative multiagent systems is a difficult problem in many real world domains. Learning in such multiagent systems can be slow because an agent may not only need to learn how to behave in a complex environment, but also to account for the actions of other learning agents. The inability of an agent to distinguish between the true environmental dynam...

متن کامل

D++: Structural credit assignment in tightly coupled multiagent domains

Autonomous multiagent teams can be used in complex exploration tasks to both expedite the exploration and improve the efficiency. However, use of multiagent systems presents additional challenges. Specifically, in domains where the agents' actions are tightly coupled, coordinating multiple agents to achieve cooperative behavior at the group level is difficult. In this work, we demonstrate that ...

متن کامل

CLEAN rewards for improving multiagent coordination in the presence of exploration

In cooperative multiagent systems, coordinating the jointactions of agents is difficult. One of the fundamental difficulties in such multiagent systems is the slow learning process where an agent may not only need to learn how to behave in a complex environment, but may also need to account for the actions of the other learning agents. Here, the inability of agents to distinguish the true envir...

متن کامل

CLEANing the Reward: Counterfactual Actions to Remove Exploratory Action Noise in Multiagent Learning

Learning in multiagent systems can be slow because agents must learn both how to behave in a complex environment and how to account for the actions of other agents. The inability of an agent to distinguish between the true environmental dynamics and those caused by the stochastic exploratory actions of other agents creates noise in each agent’s reward signal. This learning noise can have unfore...

متن کامل

CLEANing the reward: counterfactual actions to remove exploratory action noise in multiagent learning (extended abstract)

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2013

Multiagent Learning with a Noisy Global Reward Signal

نویسندگان

چکیده

منابع مشابه

CLEANing the Reward: Counterfactual Actions Remove Exploratory Action Noise in Multiagent Learning

D++: Structural credit assignment in tightly coupled multiagent domains

CLEAN rewards for improving multiagent coordination in the presence of exploration

CLEANing the Reward: Counterfactual Actions to Remove Exploratory Action Noise in Multiagent Learning

CLEANing the reward: counterfactual actions to remove exploratory action noise in multiagent learning (extended abstract)

عنوان ژورنال:

اشتراک گذاری